Methylator - DNA cytosine methylation pipeline for plants and mammals

Jonas Bucher1 , Masaomi Hatakeyama2,3, Ueli Grossniklaus1,4, Deepak Tanwar1,4


1 Plant Development Genetics, Department of Plant and Microbial Biology, University of Zurich
2 Evolutionary and Ecological Genomics, Department of Evolutionary Biology and Environmental Studies
3 Functional Genomics Center Zurich, ETH Zürich and University of Zurich
4 URPP Evolution in Action, Department of Plant and Microbial Biology, University of Zurich

jonas.bucher@uzh.ch | deepak.tanwar@evolution.uzh.ch

Introduction

DNA cytosine methylation is the addition of a methyl group to a cytosine in the DNA. It impacts transcription and therefore plays a major role in several vital processes. In mammals, DNA cytosine methylation predominantly occurs in CG sequence contexts, whereas in plants the CHG and CHH contexts are common as well.


Figure 1: Methylation contexts in mammals and plants.

Various tools have been introduced to facilitate the analysis of DNA cytosine methylation data. Usually, they focus on a small part of the workflow, which still leaves users with a considerable amount of work to evaluate appropriate tools, transform intermediate output, and finally generate presentable figures. Additionally, many tools are limited in terms of the input data, only providing support for certain species and/or sequencing methods.

Pipeline overview

Here we introduce Methylator, a user-friendly tool for a full DNA cytosine methylation analysis, with a easy-to-use interface, facilitated reproducibility and interactive visualizations of results.


Figure 2: Overview of Methylator.

Duplicate removal

Although technical duplicate reads can arise from different sources, most deduplication tools focus on PCR duplicates. Clumpify can deal with the different types of duplication in the sequencing data, such as optical duplicates. As the abundance of the types of duplicates in the data depends on the sequencing technology used, Methylator adapts the duplication removal accordingly to the user input.

Types of duplicates in sequencing data


Figure 3: Types of duplicates that can be present after sequencing. Adapted from biostars.org/p/229842/

Alignment

Bisulfite treatment of DNA results in a decreased sequence complexity, which deteriorates mapping efficiency. Overall, this causes loss of a big proportion of the sequencing data for analysis.

Dirty Harry method

The Dirty Harry method offers an improvement to the mapping rate by remapping the unaligned reads locally. Through that, this method increases the mapping efficiency and retains a considerable amount of cytosine sites, which would otherwise be lost.


Figure 4: Dirty Harry alignment method. Adapted from Wu et al. (2019)

Visualization

For each type of analysis, several outputs are generated and visualized in a single interactive shiny app. Publication-ready figures are created using colourblind-friendly palettes. Each plot can be customized and downloaded individually by the user.


Figure 5: Enrichment table.


Figure 6: Dot plot of biological processes enrichment.


Figure 7: Volcano plot of genomic regions enrichment.

Visual customization


Figure 8: Colour customization of figures.

Comparison with other tools

Methylator methylseq ARPEGGIO MethylStar MethylC-analyzer
Platform independent

Self-contained

Interface GUI (SUSHI, Galaxy, Shiny), CLI CLI CLI CLI CLI, GUI
Input Data WGBS, RRBS, PBAT, TAPS, ABBS WGBS, RRBS WGBS WGBS, PBAT, single-cell WGBS, RRBS
Supported genomes Mammals, Plants (incl. Polyploids) General Polyploids Mammals, Plants Mammals, Plants
Quality control

Alignment Bismark (incl. dirty harry), Arioc (GPU-based), EAGLE-RC Bismark, bwa-meth Bismark, EAGLE-RC Bismark

Deduplication Clumpify Bismark, Picard Bismark Bismark

Methylation context All All All All All
Exploratory data analysis PCA, heatmaps, methylation summaries

PCA, heatmaps, methylation summaries
Differential methylation analysis DMRs, DMLs

DMRs and DMGs
Copy number variation analysis CNVkit

Functional analysis GO, KEGG, Reactome, user-defined

Motif analysis Homer

Visualization

Year published In development 2020 2021 2020 2023

      Table 1: Comparison with features of widely used tools.

GUI = graphical user interface, CLI = command line interface
DMR/DML/DMG = Differentially Methylated Regions/Loci/Genes
GO = Gene Ontology

Conclusion

User-friendly, reproducible and sustainable DNA cytosine methylation data analysis pipeline
Support for plants & mammals
Can analyze methylation data from various library preparation methods for bulk sequencing
Interactive visualizations of results

References

Wu, Peng, Yan Gao, Weilong Guo, and Ping Zhu. 2019. “Using Local Alignment to Enhance Single-Cell Bisulfite Sequencing Data Efficiency.” Bioinformatics 35 (September): 3273–78. https://doi.org/10.1093/bioinformatics/btz125.